-
Notifications
You must be signed in to change notification settings - Fork 119
[PECOBLR-201] add variant support #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
src/databricks/sql/thrift_backend.py
Outdated
@@ -692,12 +692,36 @@ def _col_to_description(col): | |||
else: | |||
precision, scale = None, None | |||
|
|||
# Extract variant type from field if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is correct? I tried and was getting metadata
as null when the column type is variant
. Also for variant the pyarrow
schema just shows string
in my testing, shouldn't the server return variant
type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,
debug output:
[SHIVAM] field pyarrow.Field<CAST(1 AS VARIANT): string>
[SHIVAM] field metadata {b'Spark:DataType:SqlName': b'VARIANT', b'Spark:DataType:JsonType': b'"variant"'}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivam2680 I am getting this as the arrow_schema, where metadata is null. Is this some transient behaviour ? or am I missing something
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
src/databricks/sql/thrift_backend.py
Outdated
if field is not None: | ||
try: | ||
# Check for variant type in metadata | ||
if field.metadata and b"Spark:DataType:SqlName" in field.metadata: | ||
sql_type = field.metadata.get(b"Spark:DataType:SqlName") | ||
if sql_type == b"VARIANT": | ||
cleaned_type = "variant" | ||
except Exception as e: | ||
logger.debug(f"Could not extract variant type from field: {e}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check with eng-sqlgateway if there is a way to get this from thrift metadata. python connector uses thrift metadata for getting metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there is some documentation/contract around it or is it purely from empirical evidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these schema bytes are read from t_result_set_metadata_resp.arrowSchema
itself. Please refer to https://sourcegraph.prod.databricks-corp.com/databricks/databricks-sql-python/-/blob/src/databricks/sql/backend/thrift_backend.py?L812
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you plz incorporate @pytest.mark.parametrize
, I feel a lot of test code duplication can be avoided
tests/unit/test_thrift_backend.py
Outdated
@@ -2356,6 +2356,149 @@ def test_execute_command_sets_complex_type_fields_correctly( | |||
t_execute_statement_req.useArrowNativeTypes.intervalTypesAsArrow | |||
) | |||
|
|||
def test_col_to_description_with_variant_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is too much code duplication, like multiple tests just need different arguments. Can you use pytest fuxtures with arguments
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
The
|
Description
This pull request introduces support for detecting and handling VARIANT column types in the Databricks SQL Thrift backend, along with corresponding tests for validation.
updated the
_col_to_description
and_hive_schema_to_description
methods to process metadata for VARIANT typesAdded unit and end-to-end tests to ensure proper functionality.
Testing details
End-to-End Tests:
tests/e2e/test_variant_types.py
to validate VARIANT type detection and data retrieval. Includes tests for creating tables with VARIANT columns, inserting records, and verifying correct type handling and JSON parsing.Unit Tests: